Cluster-discovery of Twitter messages for event detection and trending

نویسندگان

  • Shakira Banu Kaleel
  • Abdolreza Abhari
چکیده

Social media data carries abundant hidden occurrences of real-time events in the world which raises the demand for efficient event detection and trending system. The Locality Sensitive Hashing (LSH) technique is capable of processing the large-scale big datasets. In this thesis, a novel framework is proposed for detecting and trending events from tweet clusters presence in Twitter 1 dataset that are discovered using LSH. The experimental results obtained from this research work showed that the LSH technique took only 12.99% of the running time compared to that required for K-means to find all of the tweet clusters. Key challenges include: 1) construction of dictionary using incremental TF-IDF in high-dimensional data in order to create tweet feature vector 2) leveraging LSH to find truly interesting events 3) trending the behavior of event based on time, geo-locations and cluster size and 4) speed-up the cluster-discovery process while retaining the cluster quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond Trending Topics: Real-World Event Identification on Twitter

User-contributed messages on social media sites such as Twitter have emerged as powerful, real-time means of information sharing on the Web. These short messages tend to reflect a variety of events in real time, making Twitter particularly well suited as a source of real-time event content. In this paper, we explore approaches for analyzing the stream of Twitter messages to distinguish between ...

متن کامل

Monitoring Spatial Coverage of Trending Topics in Twitter

Most messages posted in Twitter usually discuss an ongoing event, triggering a series of tweets that together may constitute a trending topic (e.g., #election2012, #jesuischarlie, #oscars2016). Sometimes, such a topic may be trending only locally, assuming that related posts have a geographical reference, either directly geotagging them with exact coordinates or indirectly by mentioning a well-...

متن کامل

Twitter Trends Detection by Identifying Grammatical Relations

The problem considered in this paper relates to identification of trends in a given area based on analysis of Twitter messages. The approaches currently used for Twitter trends detection are based on n-grams. We propose another approach of trend detection based on identifying trend as grammatical relation and perform the identification of trending relations on the basis of their frequency chang...

متن کامل

Mining spatio-temporal information on microblogging streams using a density-based online clustering method

0957-4174/$ see front matter 2012 Elsevier Ltd. A doi:10.1016/j.eswa.2012.02.136 E-mail address: [email protected] Social networks have been regarded as a timely and cost-effective source of spatio-temporal information for many fields of application. However, while some research groups have successfully developed topic detection methods from the text streams for a while, and even som...

متن کامل

#FewThingsAboutIdioms: Understanding Idioms and Its Users in the Twitter Online Social Network

To help users find popular topics of discussion, Twitter periodically publishes ‘trending topics’ (trends) which are the most discussed keywords (e.g., hashtags) at a certain point of time. Inspection of the trends over several months reveals that while most of the trends are related to events in the off-line world, such as popular television shows, sports events, or emerging technologies, a si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Comput. Science

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2015